Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields

نویسندگان

  • Gideon S. Mann
  • Andrew McCallum
چکیده

This paper presents a semi-supervised training method for linear-chain conditional random fields that makes use of labeled features rather than labeled instances. This is accomplished by using generalized expectation criteria to express a preference for parameter settings in which the model’s distribution on unlabeled data matches a target distribution. We induce target conditional probability distributions of labels given features from both annotated feature occurrences in context and adhoc feature majority label assignment. The use of generalized expectation criteria allows for a dramatic reduction in annotation time by shifting from traditional instance-labeling to feature-labeling, and the methods presented outperform traditional CRF training and other semi-supervised methods when limited human effort is available.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data

In this paper, we present an overview of generalized expectation criteria (GE), a simple, robust, scalable method for semi-supervised training using weakly-labeled data. GE fits model parameters by favoring models that match certain expectation constraints, such as marginal label distributions, on the unlabeled data. This paper shows how to apply generalized expectation criteria to two classes ...

متن کامل

Painless Semi-Supervised Morphological Segmentation using Conditional Random Fields

We discuss data-driven morphological segmentation, in which word forms are segmented into morphs, that is the surface forms of morphemes. We extend a recent segmentation approach based on conditional random fields from purely supervised to semi-supervised learning by exploiting available unsupervised segmentation techniques. We integrate the unsupervised techniques into the conditional random f...

متن کامل

On the Use of Virtual Evidence in Conditional Random Fields

Virtual evidence (VE), first introduced by (Pearl, 1988), provides a convenient way of incorporating prior knowledge into Bayesian networks. This work generalizes the use of VE to undirected graphical models and, in particular, to conditional random fields (CRFs). We show that VE can be naturally encoded into a CRF model as potential functions. More importantly, we propose a novel semisupervise...

متن کامل

Alternating Projections for Learning with Expectation Constraints

We present an objective function for learning with unlabeled data that utilizes auxiliary expectation constraints. We optimize this objective function using a procedure that alternates between information and moment projections. Our method provides an alternate interpretation of the posterior regularization framework (Graca et al., 2008), maintains uncertainty during optimization unlike constra...

متن کامل

Labeling the Languages of Words in Mixed-Language Documents using Weakly Supervised Methods

In this paper we consider the problem of labeling the languages of words in mixed-language documents. This problem is approached in a weakly supervised fashion, as a sequence labeling problem with monolingual text samples for training data. Among the approaches evaluated, a conditional random field model trained with generalized expectation criteria was the most accurate and performed consisten...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008